AITopics | eluder pf

Collaborating Authors

eluder pf

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Misra, Dipendra, Pacchiano, Aldo, Chi, Ta-Chung, Gao, Ge

arXiv.org Machine LearningJan-28-2026

We study how to fine-tune LLMs using user-edit deployment data consisting of a set of context, an agent's response, and user edits. This deployment data is naturally generated by users in applications such as LLMs-based writing assistants and coding agents. The _natural_ origin of user edits makes it a desired source for adapting and personalizing LLMs. In this setup, there emerges a unification of various feedback types namely preferences, supervised labels, and cost that are typically studied separately in the literature. In this paper, we initiate the theoretical investigation of learning from user edits. We first derive bounds for learning algorithms that learn from each of these feedback types. We prove that these algorithms have different trade-offs depending upon the user, data distribution, and model class. We then propose a simple ensembling procedure to jointly learn from these feedback types. On two domains adapted from Gao et al. 2024, we show our ensembling procedure outperforms these methods that learn from individual feedback. Further, we show that our proposed procedure can robustly adapt to different user-edit distributions at test time.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2601.19055

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Second Order Bounds for Contextual Bandits with Function Approximation

Pacchiano, Aldo

arXiv.org Machine LearningSep-24-2024

Many works have developed algorithms no-regret algorithms for contextual bandits with function approximation, where the mean rewards over context-action pairs belongs to a function class. Although there are many approaches to this problem, one that has gained in importance is the use of algorithms based on the optimism principle such as optimistic least squares. It can be shown the regret of this algorithm scales as square root of the product of the eluder dimension (a statistical measure of the complexity of the function class), the logarithm of the function class size and the time horizon. Unfortunately, even if the variance of the measurement noise of the rewards at each time is changing and is very small, the regret of the optimistic least squares algorithm scales with square root of the time horizon. In this work we are the first to develop algorithms that satisfy regret bounds of scaling not with the square root of the time horizon, but the square root of the sum of the measurement variances in the setting of contextual bandits with function approximation when the variances are unknown. These bounds generalize existing techniques for deriving second order bounds in contextual linear problems.

algorithm, eluder pf, probability, (13 more...)

arXiv.org Machine Learning

2409.16197

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

Experiment Planning with Function Approximation

Pacchiano, Aldo, Lee, Jonathan N., Brunskill, Emma

arXiv.org Machine LearningJan-10-2024

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms--for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies--producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied [53], results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2401.05193

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback